Active Instance Sampling via Matrix Partition

نویسنده

  • Yuhong Guo
چکیده

Recently, batch-mode active learning has attracted a lot of attention. In this paper, we propose a novel batch-mode active learning approach that selects a batch of queries in each iteration by maximizing a natural mutual information criterion between the labeled and unlabeled instances. By employing a Gaussian process framework, this mutual information based instance selection problem can be formulated as a matrix partition problem. Although matrix partition is an NP-hard combinatorial optimization problem, we show that a good local solution can be obtained by exploiting an effective local optimization technique on a relaxed continuous optimization problem. The proposed active learning approach is independent of employed classification models. Our empirical studies show this approach can achieve comparable or superior performance to discriminative batch-mode active learning methods.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

High density-focused uncertainty sampling for active learning over evolving stream data

Data labeling is an expensive and time-consuming task, hence carefully choosing which labels to use for training a model is becoming increasingly important. In the active learning setting, a classifier is trained by querying labels from a small representative fraction of data. While many approaches exist for non-streaming scenarios, few works consider the challenges of the data stream setting. ...

متن کامل

Asymptotics for the number of blocks in a conditional Ewens-Pitman sampling model

The study of random partitions has been an active research area in probability over the last twenty years. A quantity that has attracted a lot of attention is the number of blocks in the random partition. Depending on the area of applications this quantity could represent the number of species in a sample from a population of individuals or the number of cycles in a random permutation, etc. In ...

متن کامل

Nearest-Neighbor-Based Active Learning for Rare Category Detection

Rare category detection is an open challenge for active learning, especially in the de-novo case (no labeled examples), but of significant practical importance for data mining e.g. detecting new financial transaction fraud patterns, where normal legitimate transactions dominate. This paper develops a new method for detecting an instance of each minority class via an unsupervised local-density-d...

متن کامل

An Importance Sampling Scheme on Dual Factor Graphs - II. Models with Strong Couplings

We consider the problem of estimating the partition function of the two-dimensional ferromagnetic Ising and Potts models in an external magnetic field. The estimation is done via importance sampling in the dual of the Forney factor graph representing the models. We present importance sampling schemes that can efficiently compute an estimate of the partition function in a wide range of model par...

متن کامل

An Efficient Algorithm for Upper Bound on the Partition Function of Nucleic Acids

It has been shown that minimum free-energy structure for RNAs and RNA-RNA interaction is often incorrect due to inaccuracies in the energy parameters and inherent limitations of the energy model. In contrast, ensemble-based quantities such as melting temperature and equilibrium concentrations can be more reliably predicted. Even structure prediction by sampling from the ensemble and clustering ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010